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A novel two-dimensional parallel computing method for real-time Kalman filtering is 
presented. The mathematical formulation of a Kalman filter algorithm is rearranged to 
be the type of Faddeev algorithm for generalizing signal processing. The dataflow mapping 
from the Faddeev algorithm to a two-dimensional concurrent computing structure is 
developed. The architecture of the resulting processor cells is regular , simple , expand- 
able , and therefore naturally suitable for VLSI chip implementation. The computing 
methodology and the two-dimensional systolic arrays are useful for Kalman filter appli- 
cations as well as other matrix/vector based algebraic computations. 


I. Introduction 

For more than 20 years, the Kalman filter has been applied 
extensively in many signal processing applications, including 
target tracking, adaptive controls, radar-signal processing, and 
failure-proof systems (Refs. 1 to 4). The applicability of the 
Kalman filter to real-time signal processing problems is in 
general limited by the relatively complex mathematical opera- 
tions necessary in computing the Kalman filtering algorithm 
(Refs. 5 and 6). However, with the rapid development of 
VLSI integrated circuits (Refs. 7 to 9), it is quickly becoming 
technologically feasible to implement Kalman filters in real 
time. 

The processing of a Kalman filter requires matrix/ vector 
operations such as multiplication, addition, subtraction, and 
inversion. Among these, matrix inversion is the most difficult 
to implement in terms of speed and accuracy. The Faddeev 
algorithm (Refs. 10 and 11) has been suggested as a universal 


algorithm for various matrix manipulations due to the fact 
that it is easily systematized for matrix calculations and maps 
easily into a concurrent systolic array architecture. It is 
natural to arrange Kalman filter algorithms into a form of 
Faddeev algorithm to maximize the capability of hardware 
implementation of systolic arrays. First of all, this proposed 
algorithm avoids the direct matrix inverse computation as the 
usual back substitution in the Kalman Filter implementation, 
but obtains the values of desired results at the end of the 
forward course of computation, resulting in a considerable 
saving in added processing and storage. Secondly, since the 
Gaussian elimination procedure is applied through the compu- 
tation, numerical stability is obtained. Thirdly, the parallel, 
modular computer architecture which consists of systolic 
processors provides simultaneous high throughput and a 
capability for a wide variety of linear algebraic operations. 

In the following sections, we describe, in order, Kalman 
filters, the Faddeev algorithm, the implementation, a two- 
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dimensional systolic array architecture, and a numerical 
example. 


x(k/k) = x(k/k - 1) + K(k) Az(k) 


( 8 ) 


II. Kalman Filter 

Kalman filters have been shown to be the optimal linear 
estimator in the least square sense for estimating dynamic 
system states in linear systems. The Kalman filter updates 
state estimation based on prior estimates and observed mea- 
surements. It consists of the model of the dynamic process 
which performs the function of prediction and a feedback 
correction scheme. The measurements can be processed as 
they occur, and there is no need to store any measurement 
data. However, all the associated matrices which describe the 
system dynamic, measurement system, and noise is assumed 
to be known. The conventional discrete time-varying Kalman 
filtering process involves the propagation of state estimates 
and error covariance matrices from time sample to next 
time sample. Discussions and applications on Kalman filters 
can be found widely in literature (Refs. 1 to 6). 

The following equations define a general dynamic system 
and a measurement system. 

x(k + l) = 4>( k)x(k) + w(k) (1) 

z(k) = H(k)x(k) + v(k) (2) 

In Eq. (1), 0(k) is a n X n matrix called the state transi- 
tion matrix which describes the plant, x(k) is the state vector 
with n -dimension, and w(k) is a n -vector called the distur- 
bance. In Eq. (2), z(k) is a m-vector termed the measurement 
vector; v(k) is also a m -vector called the measurement noise 
vector, and H(k) is a m X n matrix called the measurement 
matrix. The disturbance w(k) and noise v(k) are assumed to 
be zero-mean Gaussian white noise sequences with covariance 
matrices Q(k) and R(k), respectively. Furthermore, sequences 
w(k) and v(k) are assumed to be statistically independent. 
The Kalman filter is described by the following equations 
under assumptions that matrices 0 (k), H(k), R(k), and 
Q(k) are known. 


k = 1,2,--- 

Initial conditions are given as follows: 

x(0/0) = 0 (9) 

P(0/0) = P(0) or P- 1 (0/0) = P 1 (0) (10) 

Note that matrices P(k/k) and P(k + 1/k) are commonly called 
error covariance matrices. The vector x(k/k) represents the 
optimal estimate of x(k) based on the measurement sequence 
(z(l),z(2), z(3), • • • , z(k)}. Equations (3) and (4) are referred 
to as time updates and Eqs. (5), (6), and (8) are referred 
to as measurement updates. These filter equations are com- 
puted in order as listed from Eqs. (3) to (8). 


III. Faddeev Algorithm 

Consider a linear equation: 

AX = B (11) 

Suppose that it is desired to find CX + D, a linear combination 
of X without first finding X, which is an unknown. This can 
be represented in matrix form as: 

A B 

( 12 ) 

-C D 

where A, B, C, and D are in a trices ox vectois. The dimension 
of the matrices will be discussed later. By adding suitable 
combinations of A and B to -C and D, the term “-C + WA” 
appears in the lower left hand quadrant and “D + WB” in the 
bottom right hand quadrant. It shows in matrix form as: 


x(k/k - 1) = 0 (k - 1) x(k - 1/k - 1) (3) 

P(k/k - 1) = 0(k - l)P(k - 1/k - 1) 0 T (k - 1) + Q(k - 1) 

( 4 ) 

P' 1 (k/k) = P' 1 (k/k - 1) + H r (k) R' 1 (k) H(k) (5) 



K(k) = P(k/k)H r (k)R' 1 (k) 
Az(k) = z(k) - H(k)x(k/k - 1) 


(6) If W specifies the appropriate linear combination such that 
the lower left hand side of Eq. (13) is zero, then CX + D will 

(7) appear on the lower right hand side. 
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That is, 


1st pass: 


i 

< 

u 

II 

£ 

(14) 

I 

x(k - 1/k - 1) 

D + WB = D + CA" 1 B 


-*<k-l) 

0 

= CX + D 

05) 

2nd pass: 



x(k/k - 1) 


Representing the above equation in matrix form, we obtain: 


A 

B 

0 

CX + D 


06 ) 


Note that CX + D is the desired result and appears in the 
bottom right hand quadrant after the process to annul the 
bottom left hand quadrant. 

It is clear from the above illustration that given matrices A, 
B, C and D, CA~* B + D can be computed. Consequently, by 
proper selection of matrices A, B, C, and D, the capability of 
the Faddeev algorithm can be fully utilized. The various 
possible matrix manipulations are shown in Fig. 1 . 

The simplicity of the algorithm is due to the absence of a 
necessity to actually identify the multipliers of the rows of 
A and the elements of B; it is only necessary to “annul the 
last row.” This can be done by triangularization, a numeri- 
cally stable procedure, combined with an equally stable Gaus- 
sian elimination procedure (Refs. 12 and 13). 

One of the more important features of this algorithm is 
that it avoids the usual back substitution or solution to the 
triangular linear system and obtains the values of the unknowns 
directly at the end of the forward course of the computation, 
resulting in a considerable savings in added processing and 
storage. 


IV. Implementation 

Kalman filter algorithms are adjusted to be the type of 
Faddeev algorithm in this section (Ref. 14). Computations are 
cyclically propagated through the following ordered set of 
passes. Note that the new data could be shifted into the 
array from the top, row by row as the calculation proceeds, 
so that there would be no delay in starting the next matrix 
computation. 


P' 1 (k-l/k-l) 

« r (k-l) 

-<f>(k-l) 

Q(k - 1) 


3rd pass: 


R(k) 

I 

-H r (k) 

0 


4th pass: 

P(k/k - 1) 

I 


-I 

0 


5th pass: 

I 


H(k) 

-H r (k) R" 1 

00 

P- 1 (k/k - 1 ) 


6th pass: 


P" 1 (k/k) 

H t (k) R -1 (k) 

-I 

0 


P(k/k - 1) 


H r (k) IT 1 (k) 


P' 1 (k/k - 1) 


P-' (k/k) 


K(k) 


7th pass: 


I 

x(k/k - 1) 

H(k) 

z(k) 

8th pass: 


I 

Az(k) 

-K(k) 

x(k/k - 1) 


Az(k) 


— x(k/k) 


Note that the results (lower right quadrant) of each pass in 
general, needs to be stored and is used in later passes as new 
entries. 
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V. Architecture 

A general architecture is proposed to process all eight 
passes as mentioned in the previous section. To fit the type of 
Faddeev algorithm, the number of columns of the matrix 
(lower left quadrant) should be less than or equal to that of 
the matrix (upper left quadrant). Fortunately, all eight passes 
meet this requirement so that the matrix (lower left quadrant) 
will be annulled row by row. This can be done by ordinary 
Gaussian elimination. However, Gaussian elimination in gen- 
eral requires pivoting, and the pivoting strategy is not suited 
to a systolic array since it may require global communication 
for the pivot section. The neighbor pivoting technique intro- 
duces a zero to a row by subtracting a multiple of an adjacent 
row from it, interchanging the two rows when necessary to 
prevent the multiple from exceeding unity (Ref. 13). 

This process is shown in the modified Faddeev algorithm 
(a two-step procedure): 


A 

B 

step 1 T 

MB 

-c 

D 

-C 

D 

T 

MB 

step 2 T 

MB 

-c 

D 

-C + WT 

D + WMB 


where T is an upper triangular, and M is a nonsingular (trian- 
gular izat ion) matrix. Since W = CT - * 1 , the final result is G = 
D + WMB = D + CA" 1 B. In order to triangularize matrix A 
and to annul matrix C, it is necessary to have a two-step 
procedure. First, A is triangularized by the neighbor pivoting 
process (simultaneously applied to B); second, C is annulled 
by Gaussian elimination using the diagonal elements of T as 
pivot elements. The square processor arrangement for both 
triangularization and annulling steps is shown in Fig. 2 
(Refs. 10 and 11). However, to compute all eight passes, the 
size of the processor is 2 n cells (row) by 2 n ceils (column). 


bearing, and bearing rate as four components (in order) of 
the state vector x(k). The maneuver noise, which impacts on 
the change in range rate and bearing rate, is assumed to be 
zero-mean white, and the range rate and bearing rate are 
uncorrelated and with variances of 330 and 1.3 X 10 -8 , 
respectively. The radar sensors are assumed to provide noisy 
measurements of the range and bearing. The noise of the 
radar sensors are assumed to be zero-mean white, and the 
range rate and bearing rate are uncorrelated and with variances 
of 10 6 and 2.89 X 10" 4 , respectively. The values of the 
matrices and initial values are listed as follows: 


<t>(k) 


1 15 0 0 

0 10 0 

0 0 1 15 

0 0 0 1 


H(k) 


1 

0 


0 0 
0 1 


0 

0 


QOO 


oooo 

0 330 0 0 

0 0 0 0 

0 0 0 1.3 X 1CT 8 


R(k) = 


10 6 

0 


2.89 X 10 


-4 


P" 1 (0/0) 


0.2 X 10- 5 -0.149 X 1(H 0 0 

-0.149 X 10" 4 0.222 X 1CH 0 0 

0 0 0.662 X 10 4 -0.483 X 10 s 

j_ 0 0 -0.453 X iCr 0.735 X i0 e 


It is desirable to have a fixed size processor to handle all 
eight passes. However, the size of the entry matrix/vector 
varies from pass to pass. By padding zeros in appropriate 
places, the In ceils (row) by In ceils (column) become the 
proper size for implementing Kalman filters. The scheme of 
zero padding is illustrated in Fig. 3. By using the fixed size 
2 n X 2 n processor arrays, the estimate x(k/k) can be updated 
in each 1 6n time unit (assuming that it takes one time unit to 
operate data in a cell.) 

VI. Numerical Example 

An example of air-traffic-control radar is chosen for numer- 
ical simulation. In this example, one has range, range rate, 


5(0/0) = 


0.200 X 10 4 

0 

0.500 

0 


This example was simulated on a VAX 780. The Kalman 
gain K n (k), the range error covariance P n (k/k - 1), and the 
bearing error covariance ^33 (k/k - 1) were plotted against 
time index k and shown in Fig. 4. These plots show the 
numerical stability of the implementation of the Kalman 
Filter on the concurrent systolic arrays. 
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Fig. 1. Examples of a variety of matrix operations 
possible with the Faddeev Algorithm 
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IF | x fn | > lx|, THEN 
s = 1 

m = -x/x jn (x jn * 0) 
m = 0 (x jn = 0) 

s — 0 

m = -x in /x 


(b) 

BOUNDARY CELL 


IF x * 0, m = Xj n /x 
IF x = 0, m = x 
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OUTPUT MATRIX 


Fig. 2. Data Flow and PE arrangement: (a) triangularization process, n = 4; 
(b) annulling process, n = 4 
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